D1108-D1112 Nucleic Acids Research, 2012, Vol. 40, Database issue 
doi:10.1093/nar/gkrl063 



Published online 21 November 2011 



DAMPD: a manually curated antimicrobial 
peptide database 

Vijayaraghava Seshadri Sundararajan 1 , Musa Nur Gabere 1 , Ashley Pretorius 1 , 
Saleem Adam 1 , Alan Christ off els 1 , Minna Lehvaslaiho 2 , John A. C. Archer 2 and 
Vladimir B. Bajic 2 '* 

1 South African National Bioinformatics Institute, The University of the Western Cape, 7535 Bellville, South Africa 
and Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal 
23955-6900, Kingdom of Saudi Arabia 

Received September 30, 201 1 ; Revised and Accepted October 26, 201 1 



ABSTRACT 

The demand for antimicrobial peptides (AMPs) is 
rising because of the increased occurrence of 
pathogens that are tolerant or resistant to conven- 
tional antibiotics. Since naturally occurring AMPs 
could serve as templates for the development of 
new anti-infectious agents to which pathogens are 
not resistant, a resource that contains relevant in- 
formation on AMP is of great interest. To that extent, 
we developed the Dragon Antimicrobial Peptide 
Database (DAMPD, http://apps.sanbi.ac.za/dampd) 
that contains 1232 manually curated AMPs. 
DAMPD is an update and a replacement of the 
ANTIMIC database. In DAMPD an integrated inter- 
face allows in a simple fashion querying based on 
taxonomy, species, AMP family, citation, keywords 
and a combination of search terms and fields 
(Advanced Search). A number of tools such as 
Blast, ClustalW, HMMER, Hydrocalculator, 
SignalP, AMP predictor, as well as a number of 
other resources that provide additional information 
about the results are also provided and integrated 
into DAMPD to augment biological analysis of 
AMPs. 

INTRODUCTION 

Antimicrobial peptides (AMPs) are recognized for their 
significant role in the innate immune response and are 
found in bacteria, fungi, animals and plants (1-7). AMPs 
are short [6-100 amino acid residues (8,9)] ribosomally- 
produced peptides that are post-translationally activated 



by proteolytic cleavage. With few exceptions, AMPs are 
cationic and possess a significant proportion (>30%) of 
hydrophobic residues (10,11). Their secondary structure 
generally adopts one of four structural motifs: (i) an 
a-helical structure, (ii) P-stranded structure due to the 
presence of two or more disulfide bonds, (iii) p-hairpin 
structure or loop due to the presence of a single disulfide 
bond and/or cyclization of the peptide chain and (iv) an 
extended structure (12). Mature AMPs form amphipathic 
structures that associate via electrostatic interactions 
between positively charged AMP regions and negatively 
charged phospholipids of the cell membrane (4,14) which 
is thought to be necessary for antimicrobial activity. 
However, AMP modes of action can be divided into mem- 
brane disruptive or non-disruptive categories (8,13-17) 
indicating that multiple modes of action following 
membrane association exist. In addition, mammalian 
AMPs exhibit chemokine-like and immunomodulatory 
activities (18,19) that can integrate innate and adaptive 
immune responses to microbial infection. Measurements 
of non-synonymous and synonymous mutation rates in 
mammalian AMP exons and comparative genomic 
studies indicate that mammalian AMP genes are under 
positive selection and are among the most rapidly 
evolving group of mammalian genes known (20). The com- 
bination of a broad spectrum antimicrobial activities 
targeted at non-protein cellular components with 
localized, high-level expression at the site of infection, 
makes AMPs highly effective antimicrobial agents with 
significant potential as a source of new antimicrobial 
drugs (21) such as new more effective antitubercular 
agents active against multidrug resistant (MDR) and ex- 
tensively drug resistant (XDR) Mycobacterium tuberculosis 
complex pathogens (22). 
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Although several AMP related databases exist such as 
APD (23), AMSdb (14), BACTIBASE (24), Defensin 
knowledgebase (25), PenBase (26), Peptaibol Database 
(27), SAPD (28), AMPer (29), CyBase (30), BAGEL 
(31), Minicope (The Innate immunity defense peptides 
MiniCOPE Dictionary, http://www.copewithcytokines 
.de/cope.cgi?key = Innate%20immunity%20defense%20 
peptides%20MiniCOPE%20Dictionary), CAMP (32), 
PhytAMP (33) and RAPD (34), each has certain short- 
comings, such as covering only specific AMP families or 
containing a limited collection of AMP families; many are 
lacking manually curated AMPs or do not have tools for 
exploration of relevant AMP characteristics (detailed in 
Supplementary Table SI, in Section 5). 

These observations combined with a frequent update of 
peptide information in major (http://apps.sanbi.ac.za/ 
dampd/Link.php) databases, motivated us to retrieve 
peptides from UniProt (36) and GenBank (37), select 
those that are AMPs based on manual curation and 
develop a new database, Dragon Antimicrobial Peptide 
Database (DAMPD) as an update and extension to our 
earlier published ANTIMIC (35) database. DAMPD 
contains information on UniProt reviewed and UniProt 
non-reviewed (i.e. putative) natural AMPs. The utility of 
DAMPD is enriched by integration of several tools such 
as BLAST (38,39), ClustalW (40), HMMER (42), 
Hydrocalculator (43), SignalP (44,45), AMP predictor, 
as well as links to several resources that can provide add- 
itional information on results generated by DAMPD and 
used to explore characteristics of AMPs and enhance 
search for novel AMPs thus supporting biological 
analysis of AMPs. 



POPULATION OF DATABASE 

AMPs were retrieved from UniProt (search: Antimicrobial 
[KW-0929]") AND (existence: evidence at protein 
level OR existence: evidence at transcript level). On 9 
September 2011 UniProt we retrieved 1483 (PE1) and 
682 (PE2) peptides. We manually curated these entries 
selecting only peptides that are experimentally validated. 
This finally resulted in 1232 (out of 2165) manually 
curated peptides that have experimentally proven/con- 
firmed antimicrobial activity to be included into 
DAMPD database. We used the latest information 
on each of these 1232 peptides from UniProt and 
re-build our latest database as a version 'DAMPD 
DB 09_Sep_2011 (1232)'. To populate DAMPD, we 
searched for AMPs from different databases, as well as 
from journals. If a database entry has a keyword 
indicating the peptide has antimicrobial qualities, this 
may be an assumption derived from sequence similarity. 
Antimicrobial activity can be sensitive to even slight 
changes in the peptide, and just one amino acid difference 
can mean that the peptide is inactive. Since we only 
wanted true AMPs, we checked the research articles 
and made sure each peptide did in fact have an experimen- 
tally proven antimicrobial activity. We also added in 
peptides that UniProt has not yet annotated to be 



antimicrobial. All annotations were verified from the 
original articles. 



DATABASE SYSTEM 

DAMPD is the collection of manually curated AMPs. An 
integrated system driven through MySql (5.0), PHP 
(5.2.4), and Perl (Ver. 5.8.8) was developed to handle the 
storage of information on these peptides. Each peptide in 
DAMPD database has a unique accession number (e.g. 
DAMPD:0001). AMPs in DAMPD include peptides con- 
taining precursor (477 AMPs), as well as mature peptide 
parts (755 AMPs). Peptide entries were cross-referenced to 
external resources and linked to graphical views. This 
section details on sub-databases, multiple catalogs, tools 
and graphics views. 

Sub-databases 

AMPs are retrieved from UniProt where the entries are 
categorized as reviewed and non-reviewed. Our 1232 
AMPs are thus split into 1113 reviewed and 119 
non-reviewed entries we named Swiss-Prot_AMP and 
TrEMBLAMP, respectively. They form two 
sub-databases in DAMPD one with reviewed and the 
other with non-reviewed UniProt AMPs. The total collec- 
tion of these AMPs is termed as 'UniProt_AMP\ We 
could not get any new peptide (not already included in 
the above) from GenBank. Consequently, DAMPD 
contains all AMPs from UniProt and GenBank. 

Multiple search capabilities 

DAMPD has six search capabilities. These six search tools 
can operate as independent search engines to interrogate 
the database or executed as part of a more complex query. 
Five of these search utilities are based on catalogs that are 
created as vocabularies of terms from taxonomies, AMP 
families, species, keywords, and citations of 1232 peptide 
entries to ease browsing of the database. 

Taxonomy catalog. Organisms are classified in a hierarch- 
ical tree structure. Taxonomy database contains every 
node (taxon) of the tree. 

Species catalog. UniProt brings out annotation on species 
on each peptide that we compiled into a catalog for 
specific search by species. 

Keyword catalog. UniProt entries are tagged with 
keywords that can be used to retrieve particular subsets 
of entries. 

Family catalog. UniProt general annotation provides 
the family details on each for known AMPs. We collected 
family, super family, sub-family information and build 
family catalog. 

Citation catalog. UniProt keeps publications with title 
(RT, example: protein interaction); author name (RA, 
examples: Ashburner, Sanger F., Pierson L.S. Ill), 
journal (RL, example: J. Exp. Biol.), year of publication 
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(YR, example: 1951) are cataloged for search on any of 
them. 

Advanced search. This search category allows a combin- 
ation of search terms, search fields and search values. 
Users can query the database using field names which 
are not listed in the other catalogs. 

Tools and resources 

Several analytical tools and relevant resources are 
integrated into DAMPD to support exploration and bio- 
logical analysis of AMPs (http://apps.sanbi.ac.za/dampd/ 
BioTools.php). For the same purpose we provided links to 
a number of additional resources. 

Standalone version of integrated tools. The DAMPD 
database tools can also operate on a standalone basis. 
For example, one can perform an alignment of antimicro- 
bial sequences or any other protein/DNA sequence using 
ClustalW (40). NJplot (41) is used to draw phylogenetic 
tree of the aligned sequence generated by ClustalW (40). 
HMMER (42) allows the user to tentatively classify 
unknown sequences into a particular antimicrobial 
family using two ways: (i) the user can either use 27 pre- 
defined antimicrobial libraries of profiles or (ii) use their 
own generated profiles. The physiochemical properties of 
the peptides such as hydrophobicity, net charge, percent- 
age of hydrophobic residues, mean hydrophobicity and 
mean hydrophobic moment can be calculated using the 
Hydrocalculator (43). Hydrophobicity of amino acid 
residues influences protein folding, protein subunits inter- 
actions binding to receptors and interactions of proteins 
with cell membranes (43). Hydrophobic moment of a 
sequence gives an indication as to how the hydrophobi- 
city's of its constituent residues of a particular segment of 
the sequences happen to be folded into a particular con- 
formation, i.e. a-helix and (3-helix. SignalP (44,45) can be 
used to predict the signal cleavage site of the peptide from 
different organisms. This is useful is determine the mature 
part of the peptide that has the activity. 

Catalog-integrated tools. Each catalog page contains 
integrated tools such as BLAST, ClustalW, HMMER 
and Hydrocalculator and SignalP. When the user 
performs a search, the result page shows the summary of 
peptide information. The user can choose to process the 
entire result set or select individual sequences from the 
result set. The integrated tools are implemented in this 
framework. 

Other resources. The retrieved peptides in DAMPD 
searches can be linked to other databases to provide add- 
itional information on these peptides. These resources are 
described here: ProtParam (46) computes the 
physico-chemical properties of a sequence. Compute PI/ 
MW (47,48) requires the user to choose or enter a 
Swiss-Prot/TrEMBL accession number. ProtScale (46) 
generates a profile of each type of amino acid on a 
protein. PeptideMass (46,49) uses a Swiss-Prot/TrEMBL 
accession number assigned to a protein to generate peptide 
information. PeptideCutter (46) requires the end user to 



enter an accession number used by Swiss-Prot/TrEMBL 
to uniquely identify proteins. ModBase (50) provides pre- 
dicted protein structure models. SMART (51) a Simple 
Modular Architecture Research Tool maps a protein 
sequence to its catalog of target domains. InterProt (52) 
uses a host of member databases to generate protein sig- 
natures, which are used as a basis to identify distant rela- 
tionships between potentially novel sequences. Pfam (53) 
is a database of protein family classification, protein 
domain data and multiple sequence alignments generated 
using Hidden Markov models. Prosite (54) is a database, 
which contains descriptions and documentation relating 
to amino acid profiles, protein domains, families and func- 
tional sites. ProtoNet (55) is a database of computation- 
ally derived protein structures, which have been clustered 
and then hierarchically structured using data, derived 
from Swiss-Prot/TrEMBL. 

DAMPD VERSUS ANTIMIC 

We transformed and upgraded the ANTIMIC database 
which contained 1799 peptides from UniProt and 
Gen-Bank, to DAMPD resource. DAMPD is aimed to 
be one stop web portal user system for AMPs. The 
capabilities of DAMPD are enriched with several tools 
that can enhance AMP studies. One of these is the AMP 
predictor (based on support vector machines, SVM) that 
can classify very accurately a peptide into a family of 
AMPs (out of 27 AMP families), a feature that currently 
no other tool and database has. Users can search the DB 
either for 'Reviewed' or 'Non-Reviewed', or both classes 
of peptides. Catalogs help user to search database differ- 
ently through various aspects like 'Keywords, Taxonomy, 
Citations, Family and Species'. The system is capable of 
updates with the latest information on 1232 peptides 
whenever these UniProt entries are updated. 'Help' 
pages are provided to give explanations on the use and 
access to the database. 'Links to other antimicrobial data- 
bases' provide direct access to information from other 
relevant AMP resources. We further enabled users to 
view the results of their queries either on their computer 
screen or to receive them by email. We regularly download 
all peptides with 'keyword: antimicrobial' from UniProt 
and GenBank, manually verify them as explained earlier 
and add only those peptide entries which are experimen- 
tally validated. 

DAMPD VERSUS OTHER AMP DATABASES 

Supplementary Table 1 provides a short comparison of 
DAMPD and currently available AMP databases and re- 
sources. Significant improvements available in DAMPD 
include the combination of BLAST, ClustalW, 
Hydrocalulator, SignalP, AMP prediction using 
HMMER and SVM-based predictor of AMPs operating 
on a database of experimentally validated peptides. These 
features are combined with multilevel catalog searching. 
The current DAMPD version has 145 entry keyword 
catalog entries, 943 taxonomy catalog entries. The AMP 
family catalog has main and sub families of 128 entries, 
and the species catalog possess 406 entries. 
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CONCLUSION 

Most of the other 'antimicrobial' databases to date are 
becoming outdated and not regularly maintained. 
We integrated retrieving, creation of catalogs, database 
version as semi-automatic process which helps us in 
updating DAMPD within a day. This process automatic- 
ally retrieved the new peptides of 'antimicrobial' category, 
but the final inclusion into DAMPD requires checking by 
a domain expert. DAMPD will be updated regularly on a 
bi-monthly basis. In the near feature we intend to add 
text-mining capabilities to identify potential AMPs from 
texts that possibly are not yet annotated as AMPs. We 
believe that our DAMPD will be a useful resource for 
researchers in this domain. 



SUPPLEMENTARY DATA 

Supplementary Data are available at NAR online: 
Supplementary table SI. 
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